skip to main content


Search for: All records

Creators/Authors contains: "Wu, Mengyun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.

     
    more » « less
  2. null (Ed.)
    Summary In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability. 
    more » « less
  3. null (Ed.)
    Abstract Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a ‘lack of information’ problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes. 
    more » « less
  4. Abstract

    Multiple types of molecular (genetic, genomic, epigenetic, etc.) measurements, environmental risk factors, and their interactions have been found to contribute to the outcomes and phenotypes of complex diseases. In each of the previous studies, only the interactions between one type of molecular measurement and environmental risk factors have been analyzed. In recent biomedical studies, multidimensional profiling, in which data from multiple types of molecular measurements are collected from the same subjects, is becoming popular. A myriad of recent studies have shown that collectively analyzing multiple types of molecular measurements is not only biologically sensible but also leads to improved estimation and prediction. In this study, we conduct an M–E interaction analysis, with M standing for multidimensional molecular measurements and E standing for environmental risk factors. This can accommodate multiple types of molecular measurements and sufficiently account for their overlapping as well as independent information. Extensive simulation shows that it outperforms several closely related alternatives. In the analysis of TCGA (The Cancer Genome Atlas) data on lung adenocarcinoma and cutaneous melanoma, we make some stable biological findings and achieve stable prediction.

     
    more » « less